'Put-That-There': Voice and Gesture at the Graphics Interface
Source
Richard A. Bolt. 1980. “Put-that-there”: Voice and gesture at the graphics interface (P262–270) https://youtu.be/RyBEUyEtxQo
Reference video / 参考動画
Further Reading
Bolt, Richard. The Human Interface: Where People and Computers Meet. Belmont, Calif.: Lifetime learning Publications, 1984.
Hinckley, Ken, Randy Pausch, John C. Goble, and Neal F. Kassel. "A Survey of Design Issues in Spatial Input." Proceedings of UIST `94, 213-222. November 1994.
Special Issue on "Multimodal Computer-Human Interaction." International Journal of Man-Machine Studies 28(2,3). 1988.
Introduction
Multimodal interfaces combine speech and gesture input. The important concept is to make the computer interface like spoken conversation. This is the research from Richard Bolt namely Put-That-There. The data is represented in graphical computers almost in two dimensional space today. The media room was set up by Nicholas Negroponte at MIT`s Architecture Machine Group. It used a two-dimensional screen to provide a view of three-dimensional space. The screen and speakers are situated in the architectural space in the room. By creating an extravagant computing environment, the result was surprisingly different and very useful for human-computer interaction. Multimodal interfaces allow a person to communicate with a computer using some inputs like speech, gesture, gaze and facial tracking system. The interface behind Put-That-There is combination of speech and gesture. Richard Boltの「Put-That-There」のコンセプトは、「話し言葉のようなインターフェイス」。3次元空間にスクリーンとスピーカーを設置して設計され、入力方法は話し声とジェスチャー。非常に独特なHCI体験を作り出した。 1. Voice and Gesture at the Graphics Interface
The Massachusetts Institute of Technology's Architecture Machine Group has been experimenting with the use of voice input and gesture detection to control events on a large scale raster-scan graphics display. The stated interactions take place in the "Media Room" of the MIT Architecture Machine Group, a physical space where the user's terminal is a room into which one walks rather than a desk-top CRT in front of which one seats. The Media Room, with its user chair, has played an important role in this study towards a "Spatial Data-Management System”, or SDMS (spatial database), in addition to its position as an embodiment of the user terminal. The specific rationale for spatially indexing data derives from our everyday experience of retrieving items, say, from our desktop: the phone to the right and above the blotter; the appointment calendar in the lower right; the "in-box" nearby the ashtray at the lower left, and so forth. In addition to its ability to create a convincing impression of interacting with an implicit, "virtual" world of data behind the frame of the physical interface, the Media Room setting implies yet another realm or order of space rife with interaction possibilities: the actual space of the Media Room itself. Two new technology innovations in the fields of networked voice recognition and location sensing in space are at the core of unlocking this interactive potential. マサチューセッツ工科大学で実装されている「Media room」について。音声認識とジェスチャー検知を用いて、巨大なディスプレイ上のイベントを制御する試みである。この部屋の物理空間そのものが、ユーザが使用する"端末"にあたるのである。 2. Speech and Space: The Technologies
Connected voice recognition has long been a difficult problem in the realm of speech recognition. NEC (Nippon Electric Company) America, Inc.'s DP-100 Linked Speech Recognition System (CSRS) can only recognize a limited quantity of connected speech. Polhemus Navigation Science, Inc. produced a space position and orientation sensor technology that was acceptable for this purpose. ROPAMS (Remote Object Position Attitude Measurement System) is a system that uses measurements of a rotating magnetic field. The sensor cube's orientation in space is determined by converting the differential signals from the sensor cube's three separate orthogonal coils. 音声認識技術についての記述。音韻が連なった文章の認識が技術的に困難だったところを、宇宙用の位置・姿勢センサー技術を応用することで解決したという話。 3. Command
Assume the user is sat in front of the Media Room's enormous screen, wearing a space-sensing cube on his wrist, and the system's microphone is ready to listen. Here is some examples of voice and pointing in concert from the system's existing repertoire.
ユーザは巨大なスクリーンの前に座って、手首に空間センシング用のキューブを取り付ける。また、音声入力用のマイクが設えてある。声とポインティングで以下のような指示が可能である。
3.1 Command “Create . . .”
Simple things are summoned into existence, copied, their properties changed, and subsequently directed to vanish in this example system. Basic forms such as circles, squares, and diamonds are employed. The word "there" is a "call" to a routine that requires the input of particular parameters.
>音声指示「Create...」の説明。音声入力でプリミティブな図形(円、四角形、ひし形)の生成・複製・プロパティの変更・消去までの指示が可能である。
3.2 Command “Move . . .”
The user may easily move elements across the screen, and the "move" command can be expressed in a variety of ways. In this example command, the voiced property "green" is simply considered as part of the item's name. It is possible to establish a well-defined mapping from attribute-name to item-attribute. The command "Copy..." is just a version of the move action, with the exception that the image of the item to be transferred stays in place at the original location.
音声指示「Move...」の説明。オブジェクトを移動させる。複製のコマンドも同様の仕様で動いている。
3.3. Command “Make that . . .”
Any item in this visual mini-universe that the user has brought into existence using voice and gesture can have its properties changed. The phrase "Make the blue triangle smaller," for example, causes the linked item to shrink in size. When the second "that" is said, the second item becomes the "model" for modification.
>音声指示「Make that...」の説明。オブジェクトのプロパティの変更。例えば「青い三角を小さくして」などと言う。
3.4. Command “Delete . . .”
The "delete" command allows you to remove objects from your display. The command's "operand" can be ".. the large blue circle" or ".that" (pointing to some item). The main idea is as follows: worldwide expungement, "clear" or "delete everything."
>音声指示「Delete...」の説明。オブジェクトを消す。"delete everything"で全消しになる。
7. Naming
The host system handles commands like "call that. the calendar." The voice recognition unit is switched from "recognition mode" to "training mode" by the host. To account for the time it takes to change modes, the spoken command line has a brief break. It is not necessary for the speaker to pause for comments. The speaker must pronounce very clearly to add the name 音声指示で使用するオブジェクトのネーミングを追加できる機能の説明。